CP-NAS: Child-Parent Neural Architecture Search for 1-bit CNNs

103

Algorithm 9 Child-Parent NAS

Input: Training data, Validation data

Parameter: Searching hyper-graph: G, K = 8, selection(o(i,j)

k

) = 0 for all edges

Output: Optimal structure α

1: while (K > 1) do

2:

for t = 1, ..., T epoch do

3:

for e = 1, ..., K epoch do

4:

Select an architecture by sampling (without replacement) one operation from

O(i,j) for every edge;

5:

Construct the Child model and Parent model with the same selected architecture,

and then train both models to get the accuracy on the validation data;

Use Eq.4.15 to compute the performance and assign that to all the sampled

operations;

6:

end for

7:

end for

8:

Update e(o(i,j)

k

) using Eq. 4.28;

9:

Reduce the search space {O(i,j)} with the worst performance evaluation by e(o(i,j)

k

) ;

10:

K = K1;

11: end while

12: return solution

4.3.3

Search Strategy for CP-NAS

As shown in Fig. 4.4, we randomly sample one operation from the K operations in O(i,j)

for every edge and then obtain the performance based on Eq. 4.15 by training the sampled

parent and child networks for one epoch. Finally, we assign this performance to all the

sampled operations. These steps are performed K times by sampling without replacement,

giving each operation exactly one accuracy for every edge for fairness.

We repeat the complete sampling process T times. Thus, each operation for every edge

has T performance {z(i,j)

k,1 , z(i,j)

k,2 , ..., z(i,j)

k,T } calculated by Eq. 4.15. Furthermore, to reduce

the undesired fluctuation in the performance evaluation, we normalize the performance of

K operations for each edge to obtain the final evaluation indicator as

e(o(i,j)

k

) =

exp{¯z(i,j)

k

}



k exp{¯z(i,j)

k

}

,

(4.16)

where ¯z(i,j)

k

= 1

T



t z(i,j)

k,t . Along with increasing epochs, we progressively abandon the worst

evaluation operation from each edge until there is only one operation for each edge.

4.3.4

Optimization of the 1-Bit CNNs

Inspired by XNOR and PCNN, we reformulate our unified framework’s binarized optimiza-

tion as Child-Parent optimization.

To binarize the weights and activations of CNNs, we introduce the kernel-level Child-

Parent loss for binarized optimization in two respects. First, we minimize the distribution

between the full-precision and corresponding binarized filters. Second, we minimize the

intraclass compactness based on the output features. We then have a loss function, as

L ˆ

H =



c,l

MSE(Hl

c, ˆHl

c) + λ

2



s

fC,s( ˆH)f C,s(H)2,

(4.17)